xxxxxxxxxxName: Huimiao Chen, JHED ID: hchen185, Email: hchen185@jhu.edu.Name: Huimiao Chen, JHED ID: hchen185, Email: hchen185@jhu.edu.
xxxxxxxxxx1. Look at the [chapter on interactive graphics](https://smart-stats.github.io/ds4bio_book/book/_build/html/interactive.html) and, specifically, the code to display a subject's MRICloud data as a sunburst plot. Do the following. Display this subject's data as a [Sankey diagram](https://plotly.com/python/sankey-diagram/). Display as many levels as you can (at least 3) for Type = 1, starting from the intracranial volume. Put this in a file called hw4.ipynb.import pandas as pdimport numpy as npimport plotly.graph_objects as goimport random# load data and pre-process## data urlsurl_1 = "https://raw.githubusercontent.com/smart-stats/ds4bio_book/main/book/assetts/kirby21AllLevels.csv"url_2 = "https://raw.githubusercontent.com/bcaffo/MRIcloudT1volumetrics/master/inst/extdata/multilevel_lookup_table.txt"## load in the hierarchy informationmultilevel_lookup = pd.read_csv(url_2, sep = "\t").drop(['Level5'], axis = 1)multilevel_lookup = multilevel_lookup.rename(columns = { "modify" : "roi", "modify.1" : "level4", "modify.2" : "level3", "modify.3" : "level2", "modify.4" : "level1"})multilevel_lookup = multilevel_lookup[['roi', 'level4', 'level3', 'level2', 'level1']]## load in the subject dataid = 127subjectData = pd.read_csv(url_1)subjectData = subjectData.loc[(subjectData.type == 1) & (subjectData.level == 5) & (subjectData.id == id)]subjectData = subjectData[['roi', 'volume']]## merge the subject data with the multilevel datasubjectData = pd.merge(subjectData, multilevel_lookup, on = "roi")subjectData = subjectData.assign(icv = "ICV")subjectData = subjectData.assign(comp = subjectData.volume / np.sum(subjectData.volume))## print dataprint(subjectData)# prepare data for Sankey diagram## initialize an empty dictionarydata_dict = {}## prepare node names and colorsdata_dict["node"] = {"label": [], "color": []}### prepare node namescols = ["icv", "level1", "level2", "level3", "level4", "roi"]node_matrix = subjectData.loc[:, cols].valuesfor i in range(len(node_matrix)): # add level names as prefixes to avoid same names from different levels for j, level in enumerate(cols): node_matrix[i][j] = level + ":" + node_matrix[i][j] # the prefixes will be deleted later node_names = node_matrix.flatten(order='F').tolist()data_dict["node"]["label"] = list(set(node_names))### prepare node colorsopacity_node = 0.8colors = [(random.randint(0, 255), random.randint(0, 255), random.randint(0, 255), opacity_node) for i in range(len(data_dict["node"]["label"]))] # generate RGBA color tuples with opacitycolor_strings = ['rgba({}, {}, {}, {})'.format(r, g, b, a) for r, g, b, a in colors] # convert the RGBA color tuples to a list of RGBA color stringsdata_dict["node"]["color"] = color_strings## prepare link sources, targets, values, and colorsdata_dict["link"] = {"source": [], "target": [], "value": [], "color": []}label_No = {} # initialize node numbers dictfor index_, label_ in enumerate(data_dict["node"]["label"]): label_No[label_] = index_ # assign numbers to nodescols = ["icv", "level1", "level2", "level3", "level4", "roi", "comp"]subjectData_new = subjectData.loc[:, cols]for i in subjectData_new.index: for j in subjectData_new.columns[1:-1]: # exclusive of "icv" and "comp" source_name = cols[cols.index(j) - 1] + ":" + subjectData_new.loc[i, cols[cols.index(j) - 1]] source_number = label_No[source_name] target_name = j + ":" + subjectData_new.loc[i, j] target_number = label_No[target_name] value = subjectData_new.loc[:, cols].groupby(j).sum().loc[subjectData_new.loc[i, j],"comp"] if_break = False # begin check whether the data has been stored for a, b, c in zip(data_dict["link"]["source"], data_dict["link"]["target"], data_dict["link"]["value"]): if a == source_number and b == target_number and c == value: if_break = True if if_break: pass else: data_dict["link"]["source"].append(source_number) data_dict["link"]["target"].append(target_number) data_dict["link"]["value"].append(value)opacity_link = 0.4data_dict["link"]["color"] = [data_dict["node"]["color"][src].replace(str(opacity_node), str(opacity_link)) for src in data_dict["link"]["source"]] for index_, label_ in enumerate(data_dict["node"]["label"]): # the prefixes of node names are deleted data_dict["node"]["label"][index_] = label_.split(":")[1]# plot Sankey diagramfig = go.Figure(data=[go.Sankey( # valueformat = ".0000f", valuesuffix = "(comp)", # Define nodes node = dict( pad = 15, thickness = 15, line = dict(color = "black", width = 0.5), label = data_dict['node']['label'], color = data_dict['node']['color'] ), # Add links link = dict( source = data_dict['link']['source'], target = data_dict['link']['target'], value = data_dict['link']['value'], color = data_dict['link']['color']))])fig.update_layout(title_text="Structure of the intracranial volume<br>Source: Multi-level MRICloud data: <a href='https://raw.githubusercontent.com/smart-stats/ds4bio_book/main/book/assetts/kirby21AllLevels.csv'>kirby21AllLevels</a>", font_size=10)fig.show()xxxxxxxxxx2. Create a simple webpage containing this graphic and host it on github pages. -Do not- host this off of your assignment repo from github classroom, since this is not public. Instead, you'll have to create a new public repo from your regular github account and add this file. Put the link to your live web page in a markdown cell of your hw4.ipynb file. Note, an easy way to create a webpage with this graphic is to export an ipynb as an html file.xxxxxxxxxxfig.write_html(file="HW4Q1.html")xxxxxxxxxx3. Create the opioid sqlite database from [https://smart-stats.github.io/ds4bio_book/book/_build/html/sqlite.html](https://smart-stats.github.io/ds4bio_book/book/_build/html/sqlite.html). However, only go to the step where the csv files are read into the database. Then exit sqlite and you should have a file `opioid.db` that has the data. Next, read the three tables into pandas dataframes and do the data wrangling from the sqlite chapter directly in pandas. Add the python code to your hw4.ipynb file.opioid.db that has the data. Next, read the three tables into pandas dataframes and do the data wrangling from the sqlite chapter directly in pandas. Add the python code to your hw4.ipynb file.xxxxxxxxxxxxxxxxxxxx4. Create an interactive scatter plot of average number of opiod pills by year plot using plotly. [See the example here](https://www.opencasestudies.org/ocs-bp-opioid-rural-urban/#Data_Import). Don't do the intervals (little vertical lines), only the points. Add your plot to an html file with your repo for your Sanky diagram and host it publicly. Put a link to your hosted file in a markdown cell of your hw4.ipynb file. Note, an easy way to create a webpage with this graphic is to export an ipynb as an html file.xxxxxxxxxximport plotly.express as px